04:00
2026-05-28
arxiv.org
machine-learning
SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training
Researchers at arXiv have identified that Batch Normalization causes gradient skew in dynamic sparse training (DST) methods, leading to slower convergence compared to dense neural network training. Thβ¦